Choice of V for V-Fold Cross-Validation in Least-Squares Density Estimation
نویسندگان
چکیده
This paper studies V -fold cross-validation for model selection in least-squares density estimation. The goal is to provide theoretical grounds for choosing V in order to minimize the least-squares loss of the selected estimator. We first prove a non-asymptotic oracle inequality for V -fold cross-validation and its bias-corrected version (V -fold penalization). In particular, this result implies that V -fold penalization is asymptotically optimal in the nonparametric case. Then, we compute the variance of V -fold cross-validation and related criteria, as well as the variance of key quantities for model selection performance. We show that these variances depend on V like 1 + 4/(V − 1), at least in some particular cases, suggesting that the performance increases much from V = 2 to V = 5 or 10, and then is almost constant. Overall, this can explain the common advice to take V = 5 —at least in our setting and when the computational power is limited—, as supported by some simulation experiments. An oracle inequality and exact formulas for the variance are also proved for Monte-Carlo cross-validation, also known as repeated cross-validation, where the parameter V is replaced by the number B of random splits of the data.
منابع مشابه
V -fold Cross-validation and V -fold Penalization in Least-squares Density Estimation
This paper studies V -fold cross-validation for model selection in least-squares density estimation. The goal is to provide theoretical grounds for choosing V in order to minimize the least-squares risk of the selected estimator. We first prove a non asymptotic oracle inequality for V -fold cross-validation and its bias-corrected version (V -fold penalization), with an upper bound decreasing as...
متن کاملAppendix to the Article “ Choice of V for V - Fold Cross - Validation in Least - Squares Density Estimation ”
This appendix is organized as follows. The first section (called Section B, for consistency with the numbering of the article) gives complementary computations of variances. Then, results concerning hold-out penalization are detailed in Section D, with the proof of the oracle inequality stated in Section 8.2 (Theorem 12) and an exact computation of the variance. Section E provides complements o...
متن کاملRobust Cross-Validation Score Functions with Application to Weighted Least Squares Support Vector Machine Function Estimation
In this paper new robust methods for tuning regularization parameters or other tuning parameters of a learning process for non-linear function estimation are proposed: repeated robust cross-validation score functions (repeated-CV Robust V −fold) and a robust generalized cross-validation score function (GCVRobust). Both methods are effective for dealing with outliers and non-Gaussian noise distr...
متن کاملSemiparametric multivariate density estimation for positive data using copulas
In this paper we estimate density functions for positive multivariate data. We propose a semiparametric approach. The estimator combines gamma kernels or local linear kernels, also called boundary kernels, for the estimation of the marginal densities with parametric copulas to model the dependence. This semiparametric approach is robust both to the well-known boundary bias problem and the curse...
متن کامل@bullet a Comparison of Cross-validation Techniques in Density Estimation! (comparison in Density Estimation)
• • ~~~~~~ In the setting of nonparametric multivariate density estimation, theorems are established which allow a comparison of the Kullback-Leibler and the Least Squares cross-validation methods of smoothing parameter selection. The family of delta sequence estimators (including kernel, orthogonal series, histogram and histospline estimators) is considered. These theorems also show that eithe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of Machine Learning Research
دوره 17 شماره
صفحات -
تاریخ انتشار 2016